4  Scaling out

03_scaling_out

Author

Ryan Wesslen

Let’s now begin to scale out examples.

5 basic_grid_search.py

This example showcases a simple grid search in one dimension, where we try different parameters for a model and pick the one with the best results on a holdout set.

5.1 Defining the image

First, let’s build a custom image and install scikit-learn in it.

import modal

app = modal.App(
    "example-basic-grid-search",
    image=modal.Image.debian_slim().pip_install("scikit-learn~=1.2.2"),
)

5.2 The Modal function

Next, define the function. Note that we use the custom image with scikit-learn in it. We also take the hyperparameter k, which is how many nearest neighbors we use.

@app.function()
def fit_knn(k):
    from sklearn.datasets import load_digits
    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsClassifier

    X, y = load_digits(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

    clf = KNeighborsClassifier(k)
    clf.fit(X_train, y_train)
    score = float(clf.score(X_test, y_test))
    print("k = %3d, score = %.4f" % (k, score))
    return score, k

6 fetch_stock_prices.py

TBD

7 Practice problem

Let’s test out what we’ve learned by creating a new script.

For this, we’ll use another scikit-learn tutorial (Gradient Boosting Regularization) but loop through a parameter (the sample size, n) and for each saving a matplotlib image from fetch_stock_prices.py.

This tutorial is inspired by a recent []:probabl. video](https://youtu.be/oZm4nGN8YMg) by Vincent Warmerdam that explored this tutorial more in detail.

7.1 Initialize the App

Let’s first name the app and create the initial image.

import io
import os

import modal

app = modal.App(
    "example-boosting-regularization",
    image=modal.Image.debian_slim()
    .pip_install("scikit-learn~=1.2.2")
    .pip_install("matplotlib~=3.9.0"),
)

We needed to install matplotlib since we’re calling it in our function.

7.2 Define function

For our function, we’ll use:

@app.function()
def fit_boosting(n):
    import matplotlib.pyplot as plt
    import numpy as np

    from sklearn import datasets, ensemble
    from sklearn.metrics import log_loss
    from sklearn.model_selection import train_test_split

    X, y = datasets.make_hastie_10_2(n_samples=n, random_state=1)

    # map labels from {-1, 1} to {0, 1}
    labels, y = np.unique(y, return_inverse=True)

    # note change from 0.8 to 0.2 test dataset
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    original_params = {
        "n_estimators": 500,
        "max_leaf_nodes": 4,
        "max_depth": None,
        "random_state": 2,
        "min_samples_split": 5,
    }

    plt.figure()

    for label, color, setting in [
        ("No shrinkage", "orange", {"learning_rate": 1.0, "subsample": 1.0}),
        ("learning_rate=0.2", "turquoise", {"learning_rate": 0.2, "subsample": 1.0}),
        ("subsample=0.5", "blue", {"learning_rate": 1.0, "subsample": 0.5}),
        (
            "learning_rate=0.2, subsample=0.5",
            "gray",
            {"learning_rate": 0.2, "subsample": 0.5},
        ),
        (
            "learning_rate=0.2, max_features=2",
            "magenta",
            {"learning_rate": 0.2, "max_features": 2},
        ),
    ]:
        params = dict(original_params)
        params.update(setting)

        clf = ensemble.GradientBoostingClassifier(**params)
        clf.fit(X_train, y_train)

        # compute test set deviance
        test_deviance = np.zeros((params["n_estimators"],), dtype=np.float64)

        for i, y_proba in enumerate(clf.staged_predict_proba(X_test)):
            test_deviance[i] = 2 * log_loss(y_test, y_proba[:, 1])

        plt.plot(
            (np.arange(test_deviance.shape[0]) + 1)[::5],
            test_deviance[::5],
            "-",
            color=color,
            label=label,
        )

    plt.legend(loc="upper right")
    plt.xlabel("Boosting Iterations")
    plt.ylabel("Test Set Deviance")

        # Dump the chart to .png and return the bytes
    with io.BytesIO() as buf:
        plt.savefig(buf, format="png", dpi=300)
        return buf.getvalue()

This is primarily the scikit-learn demo but a few modifications like:

  • we modified the test_size from 0.8 to 0.2
  • we parameterized the sample size n, which we’ll loop through
  • we’ll return the chart, similarly from fetch_stock_prices.py
  • increased the number of boosting iterations from 400 to 500

Last, we’ll define the local_entrypoint as:

OUTPUT_DIR = "/tmp/modal"


@app.local_entrypoint()
def main():
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    for n in [1000,5000,10000,20000,50000]:
        plot = fit_boosting.remote(n)
        filename = os.path.join(OUTPUT_DIR, f"boosting_{n}.png")
        print(f"saving data to {filename}")
        with open(filename, "wb") as f:
            f.write(plot)

This will end with us saving each of the images into a folder /tmp/modal.

So let’s now run this:

$ modal run boosting_regularization.py
 Initialized. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx
 Created objects.
├── 🔨 Created mount /modal-examples/03_scaling_out/boosting_regularization.py
└── 🔨 Created function fit_boosting.
saving data to /tmp/modal/boosting_1000.png
saving data to /tmp/modal/boosting_5000.png
saving data to /tmp/modal/boosting_10000.png
saving data to /tmp/modal/boosting_20000.png
saving data to /tmp/modal/boosting_50000.png
Stopping app - local entrypoint completed.
 App completed. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx

We can view a few of the images. For example, this is n = 5000:

This is particularly interesting due to the subsample = 0.5, which generally follows No shrinkage but then jumps up. It’s not clear why but a curious case.

Alternatively, let’s look at n = 10000:

Now we see a result consistent with Vincent’s video as all curves smooth out, none shrinkage learns quickly and then levels out very quickly. Even after 500 iterations no shrinkage has a lower deviance, which indicates a better out-of-sample fit.

Let’s last look at n = 50000:

Very similar curves again, but this time the gains of no shrinkage is even magnified more as up to 500 iterations there’s a larger gap between no shrinkage and shrinkage.

What’s nice about Modal is we can also view the remote logs such that:

Not surprising, our last (n = 50000) execution took the longest, taking about 4 minutes and 23 seconds. This is helpful for us to keep in mind and use these logs more as we begin to run more computationally intensive examples moving forward.